Protein Remote Homology Detection and Fold Recognition based on Features Extracted from Frequency Profiles
نویسندگان
چکیده
Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. The performance of SVM depends on the method of protein vectorization, so a suitable representation of the protein sequence is a key step for the SVM-based methods. In this paper, two kinds of profile-level building blocks of proteins, binary profiles and N-nary profiles, have been presented, which contain the evolutionary information of the protein sequence frequency profile. The protein sequence frequency profiles calculated from the multiple sequence alignments outputted by PSIBLAST are converted into binary profiles or N-nary profiles. The protein sequences are transformed into fixeddimension feature vectors by the occurrence times of each binary profile or N-nary profile and then the corresponding vectors are inputted to support vector machines. The latent semantic analysis (LSA) model, an efficient feature extraction algorithm, is adopted to further improve the performance of our methods. Experiments with protein remote homology detection and fold recognition show that the methods based on profile-level building blocks give better results compared to related methods.
منابع مشابه
Profile-based direct kernels for remote homology detection and fold recognition
MOTIVATION Protein remote homology detection is a central problem in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for remote homology detection. The performance of these methods depends on how the protein sequences are modeled and on the method used to compute the kernel function between them. RESULTS We...
متن کاملImproving protein fold recognition with hybrid profiles combining sequence and structure evolution
MOTIVATION Template-based modeling, the most successful approach for predicting protein 3D structure, often requires detecting distant evolutionary relationships between the target sequence and proteins of known structure. Developed for this purpose, fold recognition methods use elaborate strategies to exploit evolutionary information, mainly by encoding amino acid sequence into profiles. Since...
متن کاملImproving Profile-Profile Alignments via Log Average Scoring
Alignments of frequency profiles against frequency profiles have a wide scope of applications in currently used bioinformatic analysis tools ranging from multiple alignment methods based on the progressive alignment approach to detecting of structural similarities based on remote sequence homology. We present the new log average scoring approach to calculating the score to be used with alignmen...
متن کاملRecognition of Multiple PQ Issues using Modified EMD and Neural Network Classifier
This paper presents a new framework based on modified EMD method for detection of single and multiple PQ issues. In modified EMD, DWT precedes traditional EMD process. This scheme makes EMD better by eliminating the mode mixing problem. This is a two step algorithm; in the first step, input PQ signal is decomposed in low and high frequency components using DWT. In the second stage, the low freq...
متن کاملProtein Remote Homology Detection Based on Binary Profiles
Remote homology detection is a key element of protein structure and function analysis in computational and experimental biology. This paper presents a simple representation of protein sequences, which uses the evolutionary information of profiles for efficient remote homology detection. The frequency profiles are directly calculated from the multiple sequence alignments outputted by PSI-BLAST a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JCP
دوره 6 شماره
صفحات -
تاریخ انتشار 2011